Unsupervised Language Model Adaptation Incorporating Named Entity Information
نویسندگان
چکیده
Language model (LM) adaptation is important for both speech and language processing. It is often achieved by combining a generic LM with a topic-specific model that is more relevant to the target document. Unlike previous work on unsupervised LM adaptation, this paper investigates how effectively using named entity (NE) information, instead of considering all the words, helps LM adaptation. We evaluate two latent topic analysis approaches in this paper, namely, clustering and Latent Dirichlet Allocation (LDA). In addition, a new dynamically adapted weighting scheme for topic mixture models is proposed based on LDA topic analysis. Our experimental results show that the NE-driven LM adaptation framework outperforms the baseline generic LM. The best result is obtained using the LDA-based approach by expanding the named entities with syntactically filtered words, together with using a large number of topics, which yields a perplexity reduction of 14.23% compared to the baseline generic LM.
منابع مشابه
Repérage des entités nommées pour l'arabe : adaptation non-supervisée et combinaison de systèmes (Named Entity Recognition for Arabic : Unsupervised adaptation and Systems combination) [in French]
Named Entity Recognition for Arabic : Unsupervised adaptation and Systems combination The recognition of Arabic Named Entities (NE) is a potentially useful preprocessing step for many Natural Language Processing Applications, such as Machine Translation. This task is however made very complex by some peculiarities of the Arabic language. In this paper, we present a summary of our recent efforts...
متن کاملA Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملStatistical Named Entity Recognizer Adaptation
Named entity recognition (NER) is a subtask of widely-recognized utility of information extraction (IE). NER has been explored in depth to provide rapid characterization of newswire data (Sundheim, 1995; Palmer and Day, 1997). The NER task involves both identification of spans of text referring to named entities, and categorization of these entities into classes based on the role they fill in c...
متن کاملUne approche non supervisée pour le typage et la validation d'une réponse à une question en langage naturel : application à la tâche Entity de TREC 2010
Searching for named entities has been the subject of many researches in information retrieval. In this paper, we seek to determine whether a named entity is of a given type and in what extent it is. We propose to address this issue by an unsupervised web oriented language modeling approach. In addition, we want to determine if this new information can be used to improve the ranking of candidate...
متن کاملAccurate Unsupervised Joint Named-Entity Extraction from Unaligned Parallel Text
We present a new approach to named-entity recognition that jointly learns to identify named-entities in parallel text. The system generates seed candidates through local, cross-language edit likelihood and then bootstraps to make broad predictions across both languages, optimizing combined contextual, word-shape and alignment models. It is completely unsupervised, with no manually labeled items...
متن کامل